257 research outputs found

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    Frequency Estimation Of The First Pinna Notch In Head-Related Transfer Functions With A Linear Anthropometric Model

    Get PDF
    The relation between anthropometric parameters and Head-Related Transfer Function (HRTF) features, especially those due to the pinna, are not fully understood yet. In this paper we apply signal processing techniques to extract the frequencies of the main pinna notches (known as N1, N2, and N3) in the frontal part of the median plane and build a model relating them to 13 different anthropometric parameters of the pinna, some of which depend on the elevation angle of the sound source. Results show that while the considered anthropometric parameters are not able to approximate with sufficient accuracy neither the N2 nor the N3 frequency, eight of them are sufficient for modeling the frequency of N1 within a psychoacoustically acceptable margin of error. In particular, distances between the ear canal and the outer helix border are the most important parameters for predicting N1

    Relative Auditory Distance Discrimination With Virtual Nearby Sound Sources

    Get PDF
    In this paper a psychophysical experiment targeted at exploring relative distance discrimination thresholds with binaurally rendered virtual sound sources in the near field is described. Pairs of virtual sources are spatialized around 6 different spatial locations (2 directions 7 3 reference distances) through a set of generic far-field Head-Related Transfer Functions (HRTFs) coupled with a near-field correction model proposed in the literature, known as DVF (Distance Variation Function). Individual discrimination thresholds for each spatial location and for each of the two orders of presentation of stimuli (approaching or receding) are calculated on 20 subjects through an adaptive procedure. Results show that thresholds are higher than those reported in the literature for real sound sources, and that approaching and receding stimuli behave differently. In particular, when the virtual source is close (< 25 cm) thresholds for the approaching condition are significantly lower compared to thresholds for the receding condition, while the opposite behaviour appears for greater distances (~ 1 m). We hypothesize such an asymmetric bias to be due to variations in the absolute stimulus level

    Direct transcription of low-thrust trajectories with finite trajectory elements

    Get PDF
    This paper presents a novel approach to the design of Low-Thrust trajectories, based on a first order approximated analytical solution of Gauss planetary equations. This analytical solution is shown to have a better accuracy than a second-order explicit numerical integrator and at a lower computational cost. Hence, it can be employed for the fast propagation of perturbed Keplerian motion when moderate accuracy is required. The analytical solution was integrated in a direct transcription method based on a decomposition of the trajectory into direct finite perturbative elements (DFPET). DFPET were applied to the solution of two-point boundary transfer problems. Furthermore the paper presents an example of the use of DFPET for the solution of a multiobjective trajectory optimisation problem in which both the total ∆V and transfer time are minimized with respect to departure and arrival dates. Two transfer problems were used as test cases: a direct transfer from Earth to Mars and a spiral from a low Earth orbit to the International Space Station

    Improving elevation perception with a tool for image-guided head-related transfer function selection

    Get PDF
    This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection

    Evidence of Lateralization Cues in Grand and Upright Piano Sound

    Get PDF
    In a previous experiment we have measured the subjective perception of auditory lateralization in listeners who were exposed to binaural piano tone reproductions, under different conditions (normal and reversed-channel listening, manual or automatic tone production by a Disklavier, and disclosure or hiding of the same keys when they were autonomously moving during the automatic production of a tone.) This way, participants were engaged in a localization task under conditions also involving visual as well as proprioceptive (that is, relative to the position and muscular effort of their body parts) identification of the audio source with the moving key, even when the binaural feedback was reversed. Their answers, however, were clustered on a limited region of the keyboard when the channels were not reversed. The same region became especially narrow if the channels were reversed. In this paper we report about an acoustic analysis of the localization cues conducted on the stimuli that have been used in the aforementioned experiment. This new analysis employs a computational auditory model of sound localization cues in the horizontal plane. Results suggest that listeners used interaural level difference cues to localize the sound source, and that the contribution of visual and proprioceptive cues in the localization task was limited especially when the channels were reversed

    Auditory navigation with a tubular acoustic model for interactive distance cues and personalized head-related transfer functions: an auditory target-reaching task

    Get PDF
    This paper presents a novel spatial auditory display that combines a virtual environment based on a Digital Waveguide Mesh (DWM) model of a small tubular shape with a binaural rendering system with personalized head-related transfer functions (HRTFs) allowing interactive selection of absolute 3D spatial cues of direction as well as egocentric distance. The tube metaphor in particular minimizes loudness changes with distance, providing mainly direct-to-reverberant and spectral cues. The proposed display was assessed through a target-reaching task where participants explore a 2D virtual map with a pen tablet and hit a sound source (the target) using auditory information only; subjective time to hit and traveled distance were analyzed for three experiments. The first one aimed at assessing the proposed HRTF selection method for personalization and dimensionality of the reaching task, with particular attention to elevation perception; we showed that most subjects performed better when they had to reach a vertically unbounded (2D) rather then an elevated (3D) target. The second experiment analyzed interaction between the tube metaphor and HRTF showing a dominant effect of DWM model over binaural rendering. In the last experiment, participants using absolute distance cues from the tube model performed comparably well to when they could rely on more robust, although relative, intensity cues. These results suggest that participants made proficient use of both binaural and reverberation cues during the task, displayed as part of a coherent 3D sound model, in spite of the known complexity of use of both such cues. HRTF personalization was beneficial for participants who were able to perceive vertical dimension of a virtual sound. Further work is needed to add full physical consistency to the proposed auditory display

    BIVIB: A Multimodal Piano Sample Library Of Binaural Sounds And Keyboard Vibrations

    Get PDF
    An extensive piano sample library consisting of binaural sounds and keyboard vibration signals is made available through an open-access data repository. Samples were acquired with high-quality audio and vibration measurement equipment on two Yamaha Disklavier pianos (one grand and one upright model) by means of computer-controlled playback of each key at ten different MIDI velocity values. The nominal specifications of the equipment used in the acquisition chain are reported in a companion document, allowing researchers to calculate physical quantities (e.g., acoustic pressure, vibration acceleration) from the recordings. Also, project files are provided for straightforward playback in a free software sampler available for Windows and Mac OS systems. The library is especially suited for acoustic and vibration research on the piano, as well as for research on multimodal interaction with musical instruments
    • …
    corecore